AITopics | efficient model-based reinforcement learning

Collaborating Authors

efficient model-based reinforcement learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Conservative Dual Policy Optimization for Efficient Model-Based Reinforcement Learning

Neural Information Processing SystemsDec-24-2025, 21:53:24 GMT

Provably efficient Model-Based Reinforcement Learning (MBRL) based on optimism or posterior sampling (PSRL) is ensured to attain the global optimality asymptotically by introducing the complexity measure of the model. However, the complexity might grow exponentially for the simplest nonlinear models, where global convergence is impossible within finite iterations. When the model suffers a large generalization error, which is quantitatively measured by the model complexity, the uncertainty can be large. The sampled model that current policy is greedily optimized upon will thus be unsettled, resulting in aggressive policy updates and over-exploration. In this work, we propose Conservative Dual Policy Optimization (CDPO) that involves a Referential Update and a Conservative Update. The policy is first optimized under a reference model, which imitates the mechanism of PSRL while offering more stability. A conservative range of randomness is guaranteed by maximizing the expectation of model value. Without harmful sampling procedures, CDPO can still achieve the same regret as PSRL. More importantly, CDPO enjoys monotonic policy improvement and global optimality simultaneously.

conservative dual policy optimization, efficient model-based reinforcement learning, model-based reinforcement learning, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Add feedback

Review for NeurIPS paper: Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning

Neural Information Processing SystemsJan-27-2025, 05:32:28 GMT

Summary and Contributions: Conventionally when rollout-based MBRL algorithms apply an optimistic exploration strategy like UCB, aleatoric and epistemic uncertainty are often conflated into a single pointwise measure of uncertainty at each state in the rollout sequence. This submission proposes a novel augmented policy class that explicitly interacts with the model's epistemic uncertainty to hypothesize the best possible outcome for any particular action sequence. In addition to proof-of-concept experiments on easy Mujoco control tasks, the authors provide regret bounds for their exploration strategy applied to purely rollout-based MBRL methods, including a sublinear regret bound for GP dynamics models. My greatest concern with this submission lies with the reproducibility of the results. There is no mention of code, and simple, crucial implementation details are missing.

efficient model-based reinforcement learning, neurips paper, optimistic policy search and planning, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Review for NeurIPS paper: Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning

Neural Information Processing SystemsJan-27-2025, 05:32:22 GMT

In particular they convert epistemic uncertainty into "hallucinated controls" that are optimized, thereby leading to optimistic behavior.

efficient model-based reinforcement learning, neurips paper, optimistic policy search and planning, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)

Add feedback

Conservative Dual Policy Optimization for Efficient Model-Based Reinforcement Learning

Neural Information Processing SystemsJan-18-2025, 07:38:41 GMT

conservative dual policy optimization, efficient model-based reinforcement learning, model-based reinforcement learning, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.65)

Add feedback

Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning

Neural Information Processing SystemsOct-10-2024, 23:41:57 GMT

Model-based reinforcement learning algorithms with probabilistic dynamical models are amongst the most data-efficient learning methods. This is often attributed to their ability to distinguish between epistemic and aleatoric uncertainty. However, while most algorithms distinguish these two uncertainties for learning the model, they ignore it when optimizing the policy, which leads to greedy and insufficient exploration. At the same time, there are no practical solvers for optimistic exploration algorithms. In this paper, we propose a practical optimistic exploration algorithm (H-UCRL).

algorithm, efficient model-based reinforcement learning, optimistic policy search and planning, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback